Uncertainty Estimation and Analysis of Categorical Web Data
نویسندگان
چکیده
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as firstor second-order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the BetaBinomial and the Dirichlet-Multinomial models, as well as how take into account possibly unseen categories in our samples by using the Dirichlet process. We conclude by exemplifying how these higher-order models can be used as a basis for analyzing datasets, once at least part of their uncertainty has been taken into account. We demonstrate how to use the Battacharyya stastistical distance to quantify the similarity between Dirichlet distributions, and use such results to analyze a Web dataset of piracy attacks both visually and automatically.
منابع مشابه
Analysis of Dynamic Longitudinal Categorical Data in Incomplete Contingency Tables Using Capture-Recapture Sampling: A case Study of Semi-Concentrated Doctoral Exam
Abstract. In this paper, dynamic longitudinal categorical data and estimation of their parameters in incomplete contingency tables are evaluated. To apply the proposed method, a study has been conducted on the data of the semi-concentrated doctoral exam of the National Organization for Educational Testing (NOET). The results of studies such as the obtained confidence intervals and calculating t...
متن کاملJoint Bayesian Stochastic Inversion of Well Logs and Seismic Data for Volumetric Uncertainty Analysis
Here in, an application of a new seismic inversion algorithm in one of Iran’s oilfields is described. Stochastic (geostatistical) seismic inversion, as a complementary method to deterministic inversion, is perceived as contribution combination of geostatistics and seismic inversion algorithm. This method integrates information from different data sources with different scales, as prior informat...
متن کاملEstimating Uncertainty of Categorical Web Data
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first or second order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the BetaBinomial and the Dirichlet-Multinomial models, as well as how take into account possibly unseen categories in our samples by using t...
متن کاملBayes Interval Estimation on the Parameters of the Weibull Distribution for Complete and Censored Tests
A method for constructing confidence intervals on parameters of a continuous probability distribution is developed in this paper. The objective is to present a model for an uncertainty represented by parameters of a probability density function. As an application, confidence intervals for the two parameters of the Weibull distribution along with their joint confidence interval are derived. The...
متن کاملApplication of truncated gaussian simulation to ore-waste boundary modeling of Golgohar iron deposit
Truncated Gaussian Simulation (TGS) is a well-known method to generate realizations of the ore domains located in a spatial sequence. In geostatistical framework geological domains are normally utilized for stationary assumption. The ability to measure the uncertainty in the exact locations of the boundaries among different geological units is a common challenge for practitioners. As a simple a...
متن کامل